An Implemented Linearization Grammar for Japanese
نویسنده
چکیده
Free word order languages with rich case systems and discontinuous constituents pose a serious challenge to the task of constructing an efficient natural language parse system. As Japanese not only allows considerable freedom in word order but pervasive constituent ellipsis, it poses unique difficulties for efficient broad coverage parsing. We have yet to see a broad coverage Japanese grammar/parser that generates linguistically well-motivated syntactic representations for scrambled structures and interprets elided arguments. My Ph.D. project is intended to develop a linguistically grounded grammar that sustains parsing efficiency and covers this range of phenomena. I have already implemented a prototype version of the grammar/parser, the details of which are documented in [31] and I will discuss in Section 5. For this prototype I have adopted design principles and parse mechanisms which I believe are computationally and linguistically preferable over the existent Japanese parsing systems. My Ph.D. research project will consist of extending this prototype and completing a full-scale parsing system, by elaborating on the current mechanisms and overcoming a number of outstanding issues concerning word order, discontinuity and argument ellipsis. I will also make sure to incorporate research results on other languages with similar phenomena and make my parser as portable as possible, so that my study will contribute to the future parser building projects not just for Japanese but for free word order, free pro-drop languages in general.
منابع مشابه
An Implemented Description of Japanese: The Lexeed Dictionary and the Hinoki Treebank
In this paper we describe the current state of a new Japanese lexical resource: the Hinoki treebank. The treebank is built from dictionary definition sentences, and uses an HPSG based Japanese grammar to encode both syntactic and semantic information. It is combined with an ontology based on the definition sentences to give a detailed sense level description of the most familiar 28,000 words of...
متن کاملLinearization in parallel pCRL
We describe a linearization algorithm for parallel pCRL processes similar to the one implemented in the linearizer of the μCRL Toolset. This algorithm finds its roots in formal language theory: the ‘grammar’ defining a process is transformed into a variant of Greibach Normal Form. Next, any such form is further reduced to linear form, i.e., to an equation that resembles a right-linear, data-par...
متن کاملSemantic tree unification grammar: a new formalism for spoken language processing
In this paper we present the Semantic Tree Unification Grammar (STUG) which is a new formalism for parsing spoken language. The main motivation of this formalism is the combination of the robustness and simplicity of the classical semantic grammar to the deepness of the traditional syntactic formalisms. The key properties of STUG are: the direct linearization of the semantic structure, an econo...
متن کاملAn implementation of Japanese Grammar based on HPSG
1 Abstract In this thesis, I shall show h o w Japanese Phrase Structure Grammar Gun877 can be implemented in LexGram KB955, an amalgam of Categorial grammar Lam61 and Head Driven Phrase Structure Grammar PS944. The thesis presents a parser which covers three characteristic phenomena in Japanese: 1word order variation, 2 gaps in a sentence, and 3relativization. To cover word order variation, the...
متن کاملA Grammar Formalism and Parser for Linearization-based HPSG
Linearization-based HPSG theories are widely used for analyzing languages with relatively free constituent order. This paper introduces the Generalized ID/LP (GIDLP) grammar format, which supports a direct encoding of such theories, and discusses key aspects of a parser that makes use of the dominance, precedence, and linearization domain information explicitly encoded in this grammar format. W...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004